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Abstract. In this paper, we outline our work on developing a disk-based 
infrastructure for efficient visualization and graph exploration operations 
over very large graphs. The proposed platform, called graphVizdb, is 
based on a novel technique for indexing and storing the graph. Partic¬ 
ularly, the graph layout is indexed with a spatial data structure, i.e., 
an R-tree, and stored in a database. In runtime, user operations are 
translated into efficient spatial operations (i.e., window queries) in the 
backend. 

Keywords: graphVizdb, graph data, disk based visualization tool, RDF graph 
visualization, spatial, visualizing linked data, partition based graph layout. 

1 Introduction 

Data visualisation provides intuitive ways for information analysis, allowing users 
to infer correlations and causalities that are not always possible with tradi¬ 
tional data mining techniques. The wide availability of vast amounts of graph- 
structured data, RDF in the case of the Data Web, demands for user-friendly 
methods and tools for data exploration and knowledge uptake. We consider some 
core challenges related on the management and visualization of very large RDF 
graphs; e.g., the Wikidata RDF graph has more than 300M nodes and edges. 

First, their size exceeds the capabilities of memory-based layout techniques 
and libraries, enforcing disk-based implementations. Then, graph rendering is 
a time consuming process; even drawing a small part of the graph (containing 
a few hundreds of nodes) requires considerable time when we assume real-time 
systems. The same holds for graph interaction and navigation. Most operations, 
such as zoom in/out and move, are not easily implemented to large dense graphs, 
as their implementations require redrawing and re-layout large parts of them. 

Related works in the field handle very large graphs through hierarchical vi¬ 
sualization approaches. Although hierarchical approaches provide fancy visual¬ 
izations with low memory requirements, their applicability is heavily based on 
the particular characteristics of the input dataset. In most cases, the hierarchy 
is constructed by exploiting clustering and partitioning methods [1,4,5,14,19]. In 
other works, the hierarchy is defined with hub-based [15] and density-based [22] 
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Fig. 1. Preprocessing Overview 


techniques. [3] supports ad-hoc hierarchies which are manually defined by the 
users. Some of these systems offer a disk-based implementation [1,14,19] whereas 
others keep the whole graph in main memory [3,4,5,15,22]. 

In the context of the Web of Data [9,17,16,8,20,2,7], there is a large number 
of tools that visualize RDF graphs (adopting a node-link approach); the most 
notable ones are ZoomRDF [21], Fenfire [12], LODWheel [18], RelFinder [13] 
and LODeX [6]. All these tools require the whole graph to be loaded on the 
UI. Several tools that follow the same non-scalable approach have also been 
developed in the field of ontology visualization [10,11]. 

In contrast to all existing works, we introduce a generic platform, called 
graphVizdb, for scalable graph visualization that do not necessarily depend on 
the characteristics of the dataset. The efficiency of the proposed platform is 
based on a novel technique for indexing and storing the graph. The core idea 
is that in a preprocessing phase, the graph is drawn, using any of the existing 
graph layout algorithms. After drawing the graph, the coordinates assigned to 
its nodes (with respect to a Euclidean plane) are indexed with a spatial data 
structure, i.e., an R-tree, and stored in a database. In runtime, while the user is 
navigating over the graph, based on the coordinates, specific parts of the graph 
are retrieved and send to the user. 


2 Platform Overview 

The graphVizdb platform is built on top of two main concepts: (1) it is based 
on a “spatial-oriented” approach for graph visualization, similar to approaches 
followed in browsing maps; and (2) it adopts a disk-based implementation for 
supporting interaction with the graph, i.e., a database backend is used to index 
and store graph and visual information. 

Partition-based graph layout. Here we outline the partition-based approach 
adopted by the graphVizdb in order to handle very large graph. Recall that, for 
graph layout, the graph is drawn once in a preprocessing phase, using any of 
the existing graph layout algorithms. However several graph layout algorithms 
require large amount of memory in order to draw very large graphs. In order to 






























overcome this problem, our partition-based approach (outlined in Figure 1) is 
described next. 

(1) Initially, the graph (RDF) data is divided into a set of smaller sub¬ 
graphs (i.e., partitions) using a graph partitioning algorithm. At the same time, 
the graph partitioning algorithm tries to minimize the number of edges con¬ 
necting nodes in different partitions. (2) Then, using a graph layout algorithm, 
each of the sub-graph resulted from the graph partitioning, is visualized into 
a Euclidean plane, excluding (i.e., not visualizing) the edges connecting nodes 
through different partitions (i.e., crossing edges). (3) The visualized partitions 
are organized and combined into a “global” plane using a greedy algorithm whose 
goal is twofold. First, it ensures that the distinct sub-graphs do not overlap on 
the plane, and at the same time it tries to minimize the total length of the cross¬ 
ing edges. (4) Based on the “global” plane, the coordinates for each node and 
edge are indexed and stored in the database. 

Spatial operations for graph exploration. In graphVizdb, most of the user’s 
requests are translated into simple spatial operations evaluated over the database. 
In this context, window queries (i.e., spatial range queries that retrieve the in¬ 
formation contained with in a specific spatial region) are the core operation for 
most user’s requests. The user navigates on the graph by moving the viewing 
window. When the window is moved, its new coordinates with respect to the 
whole canvas are tracked on the client side, and a window query is sent to the 
server. The query is evaluated on the server using the R-tree indexes. This way, 
for each user request, graphVizdb efficiently renders only visible parts of the 
graph, minimizing in this way both backend-frontend communication cost as 
well as rendering and layout time. Additionally, more sophisticated operations, 
e.g., abstraction/enrichment zoom operations are also implemented using spatial 
operations. 

Implementation. We have implemented a graphVizdb prototype 1 which pro¬ 
vides interactive visualization over large graphs. The prototype offers three main 
operations: (1) interactive navigation, (2) multi-level exploration, and (3) key¬ 
word search. We use MySQL for data storing and indexing, the Jena framework 
for RDF data handling, Metis 2 for graph partitioning, and Graphviz 3 for drawing 
the graph partitions. In the front-end, we use mxgraph 4 , a client-side JavaScript 
visualization library. A video presenting the basic functionality of our prototype 
is available at: vimeo . com/117547871. 
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